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LEXINGTON 
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ABSTRACT 


The  Hopfield  model  neural  net  has  attracted  much  recent  attention.  One  use  of  the 
Hopfield  net  is  as  a  highly  parallel  content-addressable  memory,  where  retrieval  is 
possible  although  the  input  is  corrupted  by  noise.  For  binary  input  patterns,  an 
alternate  approach  is  to  compute  Hamming  distances  between  the  input  pattern  and 
each  of  the  stored  patterns  and  retrieve  that  stored  pattern  with  minimum  Hamming 
distance.  We  first  show  that  this  is  an  optimum  processor  when  the  noise  is 
statistically  independent  from  bit  to  bit.  We  then  present  a  Hamming  Neural  Net 
which  is  a  highly  parallel  implementation  of  this  algorithm  that  uses  computational 
elements  similar  to  those  used  in  a  Hopfield  net.  We  also  compare  the  Hopfield  and 
Hamming  nets  for  several  applications.  For  the  cases  considered,  the  Hamming  net 
generally  outperforms  the  Hopfield  net.  Also,  the  Hamming  net  requires  fewer 
interconnects  than  the  fully  connected  Hopfield  net. 
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A  COMPARISON  OF  HAMMING  AND  HOPFIELD  NEURAL 
NETS  FOR  PATTERN  CLASSIFICATION 

1.  INTRODUCTION 

There  has  been  a  recent  upsurge  of  interest  in  neural  net  models  made  of  highly  parallel 
computational  elements  connected  in  a  pattern  that  is  reminiscent  of  biological  neural  nets.  In 
particular,  much  recent  work  has  explored  the  ability  of  a  neural  model  described  by  Hopfield1’  3 
to  serve  as  a  content-addressable  memory  and  as  a  pattern  classifier  for  binary  bit  patterns.  A 
content-addressable  memory  retrieves  one  of  M  stored  patterns  given  an  input  pattern  which  is  a 
noisy  version  of  a  stored  pattern.  A  classifier  determines  which  of  M  exemplar  patterns  is  most 
similar  to  a  noisy  input  pattern.  In  the  following  we  focus  on  the  classification  problem  because 
a  content-addressable  memory  is  essentially  a  classifier  which  outputs  the  exemplar  for  the 
selected  class  instead  of  an  index  to  the  class.  We  also  focus  on  the  classification  problem 
because  classification  is  a  fundamental  operation  that  is  essential  to  the  important  problems  of 
speech  and  image  recognition  whether  achieved  by  biological  or  artificial  means. 

Past  studies  have  demonstrated  that  the  Hopfield  model  can  be  used  as  a  content- 
addressable  memory  for  random  input  patterns1’  3  and  to  classify  binary  patterns  created  from 
radar  cross  sections4,  from  consonants  and  vowels  extracted  from  spoken  words3,  and  from  lines 
in  an  image6.  These  results  demonstrate  that  a  neural  net  based  on  the  Hopfield  model  can 
perform  classification.  In  addition,  Hopfield  models  have  been  successfully  applied  to  other 
problems,  such  as  the  travelling  salesman  problem,  the  A-D  converter  problem,  and  the  signal 
decomposition  problem7’  8. 

We  have  been  interested  in  a  specific  set  of  pattern  classification  problems  in  speech  and 
image  processing.  In  some  special  cases,  such  problems  can  be  formulated  in  maximum-likelihood 
terms  and  optimum  processors  can  be  derived.  In  particular,  if  each  element  in  a  binary  pattern 
is  perturbed  independently  by  noise,  the  optimum  processor  is  an  algorithm  that  measures 
Hamming  distances  between  the  perturbed  input  pattern  and  each  of  the  stored  patterns,  and 
selects  the  minimum. 

In  this  report,  we  derive  this  optimum-processor  result  and  then  show  how  a  neural  net 
model  called  the  Hamming  net  can  be  constructed  to  perform  this  algorithm.  We  then  compare 
implementations  and  performance  of  the  Hopfield  and  Hamming  nets  using  simulations  of  a 
visual  digit  recognition  task  and  a  bibliography  retrieval  task. 
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2.  OPTIMUM  PROCESSOR  FOR  CLASSIFICATION  OF 

BINARY  PATTERNS 


An  optimum  binary  classifier  must  classify  each  binary  input  vector  x  into  one  of  M  classes 
such  that  the  probability  of  a  classification  error  is  minimized.  Here,  the  input  vector  x  has  N 
elements  which  can  be  in  one  of  two  states  denoted  the  +1  and  the  -1  states.  Each  of  the  M 
classes  is  represented  by  an  exemplar  binary  vector  xJ  where  j  =  0,1,2,...,M-1  is  the  index  to  the 
class. 

The  classifier  can  be  easily  analyzed  if  each  input  vector  to  be  classified  is  obtained  by  pass¬ 
ing  each  component  of  an  exemplar  through  a  discrete,  noisy,  memoryless  channel,  as  shown  in 
Figure  1;  i.e.,  the  noise  is  independent  from  bit  to  bit.  In  Figure  1  the  value  of  an  element  in  the 
+  1  and  -1  state  is  taken  to  be  +1  and  -1,  respectively.  In  some  further  formulations  these  values 
will  be  +1  and  0  instead. 


Xj  =  +i,  -i,  +1,. . . 

INPUT 


DISCRETE 

MEMORYLESS 

CHANNEL 


X  =  -1,  +1,  -1,  -1,.  .  . 


CORRUPTED  OUTPUT 
FED  TO  CLASSIFIER 


Figure  l.  Generation  of  corrupted  bit  pattern  by  passing  the  exemplar  for  pattern  class  j  through  a  noisy  discrete  memory • 
less  channel. 


The  channel  in  Figure  1  is  defined  by  four  conditional  probabilities: 

P(-l| +l)  =  e 
P(+i|+i)  =  l  -  e 
PC+ll-l)  =  p 

P(-i|-i)  =  i  - P  (l) 

where 

0.0  sS  e  <  0.5  and  0.0  ^  p  <  0.5  .  (2) 

Each  probability  in  Equation  (1)  is  the  conditional  probability  of  observing  a  specific  bit  at  the 
output  of  the  channel  given  a  specific  bit  at  the  input. 

The  binary  classification  problem  created  by  a  discrete  memoryless  channel  is  identical  to  the 
classical  communication  theory  problem  of  building  a  decoder  to  determine  which  of  M  block 
codes  of  length  N  was  sent  over  a  noisy  discrete  memoryless  channel.  In  our  terminology,  how¬ 
ever,  the  block  codes  are  the  exemplars.  If  the  a  priori  probabilities  of  presenting  exemplars  from 


3 


different  classes  at  the  input  to  the  noisy  channel  are  equal,  then  the  minimum  error  decoder  is  a 
maximum  likelihood  decoder9  that  selects  the  class  j*  for  which 

P(x|  x-**)  ^  P(x|  x*);  all  j  #  j*  (3) 

In  this  equation,  P(x|  xJ)  is  the  likelihood  for  class  j  or  the  conditional  probability  of  observing 
the  vector  x  at  the  output  of  the  noisy  channel,  given  that  exemplar  xJ  was  presented  at  the 
input.  A  block  diagram  of  an  optimum  maximum  likelihood  classifier  is  presented  in  Figure  2.  In 
this  figure  the  input  vector  x  is  presented  at  the  left,  likelihood  values  are  calculated  in  parallel, 
and  then  the  class  with  the  maximum  likelihood  value  is  selected.  Likelihood  values  are  denoted 
yj,  where  yj  =  p(x|  xJ).  The  output  is  a  vector  z  whose  elements  are  zero  except  for  that  element 
corresponding  to  the  class  j*  that  satisfies  Equation  (3). 
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Figure  2.  Block  diagram  of  optimum  maximum  likelihood  classifier. 


The  exact  form  of  the  maximum  likelihood  decoder  depends  on  the  probabilities  that  define 
the  noisy  channel.  In  all  cases,  however,  we  will  show  that  the  likelihoods  are  monotonically 
related  to  functions  equal  to  weighted  sums  of  elements  from  the  input  vector.  For  example,  in 
the  simplest  case  a  binary  symmetric  channel  is  used  and  p  =  6.  In  this  case  it  is  equally  likely  for 
a  +1  state  to  be  changed  to  a  -1  state  and  vice  versa  and 


N 

P(x|  X*)  =  e 


j 

ham 


N-N 

0-0 


j 

ham 


(4a) 
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where  N^a  is  the  Hamming  distance  between  x  and  xJ.  This  is  the  number  of  elements  in  the 
input  whic?i  are  not  identical  to  the  corresponding  element  in  the  exemplar  for  class  j.  Equation 
(4a)  simplifies  to 


P(x|  xj)  = 


Ni 

ham 


(1  -€)N 


(4b) 


Since  6  is  less  than  0.5,  the  first  fraction  on  the  right  is  less  than  1.0  and  Equation  (4b)  is  a 
maximum  for  that  stored  state  with  the  smallest  Hamming  distance  to  the  input.  A  neural  net 
that  implements  an  optimum  classifier  for  the  binary  symmetric  channel  must  thus  calculate  the 
Hamming  distance  to  exemplars  for  all  classes  and  then  select  that  class  which  produces  a  min¬ 
imum.  Instead  of  calculating  the  Hamming  distance  directly,  we  will  calculate  N  minus  the 
Hamming  distance  and  maximize  this  function. 

N  minus  the  Hamming  distance  can  be  calculated  from  a  weighted  sum  of  the  elements  of 
the  input  vector.  If  the  elements  of  the  input  vector  take  on  the  values  +1  and  -1  for  the  +1  and 
-1  states,  respectively,  then 


N-N{  =  C;  + 
ham  J 


N-l 

2  w.jxi 

i  =  0 


(5a) 


In  this  equation, 


XJ 

i 

W:;  =  - 

U  2 


(5b) 


and 


N 

Cj  ~  2 


(5c) 


Here  x-j  is  the  value  of  element  i  of  the  exemplar  for  class  j.  When  all  elements  in  the  input  vec¬ 
tor  match  an  exemplar  exactly,  each  element  in  the  sum  of  Equation  (5a)  adds  1/2,  and  the  total 
is  N.  Whenever  an  element  in  the  input  vector  doesn't  match  the  corresponding  element  in  the 
exemplar,  the  prior  total  is  decremented  by  1  as  required. 

An  alternative  derivation  was  suggested  in”  where  it  is  assumed  that  elements  of  the  input 
vector  x  take  on  the  values  0  and  +1  for  the  -1  and  +1  states,  respectively.  In  this  case  N  minus 
the  Hamming  distance  can  be  calculated  from: 


N-l 


N-N{  =  C;  + 
ham  J 


wijxi 


(6a) 
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In  this  equation. 


(+1  if 

xJ  =  +1 

WH-'  ^ 

x)  =  0 

1 

(6b) 

N-l 

c,  =  Nj  =  N 

J  Z 

i 

M 

(6c) 

i  =  0 

Here  represents  the  number  of  elements  in  the  exemplar  for  class  j  that  are  zero.  When  all 
elements  in  the  input  vector  match  an  exemplar  exactly,  the  sum  in  (6a)  adds  up  the  number  of 
positive  input  elements.  This,  added  to  the  number  of  zero  input  elements  results  in  N  as  desired. 
The  sum  is  reduced  by  one  whenever  a  zero  input  element  that  matches  an  exemplar  becomes 
positive,  and  whenever  a  positive  input  element  that  matches  an  exemplar  becomes  zero. 

In  the  more  general  situation,  the  noisy  channel  is  not  symmetric  and  p  ¥=  €.  In  this  case  it  is 
more  likely  that  the  +1  state  will  change  to  the  -1  state  or  vice  versa,  and  it  is  not  sufficient  to 
simply  calculate  the  Hamming  distance.  For  simplicity,  we  present  results  for  the  case  when  ele¬ 
ments  of  the  input  vector  x  take  on  the  values  0  and  +1  for  the  -1  and  +1  states,  respectively. 

We  will  also  maximize  log  P(x|  xJ),  denoted  Lj,  instead  of  P(x|  xJ).  In  this  case 

Lj=  £  x;log^-pj+  (N-NJ)log(e)  +£  Xilog^-p-j+  nJ  log(l-p)  .  (7a) 

Vx^=l  Vxj  =  0 

1  1 

This  can  be  written  as  a  weighted  sum  of  elements  of  the  input  vector  as  was  done  in  Equa¬ 
tion  (5a)  and  Equation  (6a): 


N-l 


lj  =  cj+  2>ijxi  • 
i  =  0 


In  this  equation 


log 


(iirH= 


w 


+1 

108  (rr) if  xi = 0 

Cj=  (N-NJZ)  log  (€)  +  NMog(l  -p), 

with  NJ  is  as  in  (6c).  When  p  =  €,  (7a)  reduces  to  a  form  similar  to  Equation  (6a). 


(7b) 


(7c) 


(7d) 
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3.  NEURAL  NET  IMPLEMENTATION  OF  OPTIMUM  PROCESSOR 


In  the  previous  section  it  was  demonstrated  that  the  optimum  processor  always  forms  weigh¬ 
ted  sums  of  the  elements  of  the  input  vector  and  picks  the  maximum  from  these  sums.  In  this 
section  we  demonstrate  that  this  processor  can  be  implemented  using  an  artificial  neural  net.  The 
question  of  what  algorithms  can  be  implemented  using  neural  nets  is  of  interest  because  of  the 
potential  usefulness  of  such  nets. 

An  artificial  neural  net  is  a  highly  parallel  network  with  many  interconnections  between 
analog  computational  elements  or  nodes.  The  simplest  node  forms  the  sum  of  N  weighted  inputs 
presented  on  N  input  links  and  passes  the  result  through  a  nonlinearity  out  on  one  output  link. 
Neural  nets  almost  always  include  an  inherent  nonlinearity  and  require  primarily  local  connectiv¬ 
ity  between  nodes.  In  addition,  the  weights  on  the  input  links  can  be  adapted  based  on  informa¬ 
tion  concerning  the  correctness  of  the  output.  Artificial  neural  nets  are  of  interest  primarily 
because  they  may  be  able  to  emulate  the  speed  and  performance  of  real  biological  neural  nets 
using  many  simple  slow  computational  elements  operating  in  parallel.  They  thus  offer  one  possi¬ 
ble  solution  to  the  problem  of  obtaining  the  massive  parallelism  and  computational  requirements 
that  are  presumed  to  be  required  for  such  problems  as  speech  recognition. 

Two  neural  nets  are  logically  required  to  implement  an  optimum  classifier  for  binary  pat¬ 
terns.  One  net  forms  the  weighted  sums  to  calculate  quantities  related  to  the  likelihood  of  the  dif¬ 
ferent  classes  and  the  second  picks  the  maximum. 

A  net  that  forms  weighted  sums  is  presented  in  Figure  3.  The  topology  of  this  net  is  similar 
to  that  of  a  perceptron*2  An  input  pattern  x  is  applied  at  the  bottom  of  this  net  and  an  output 
pattern  y  is  produced  at  the  top.  The  first  layer  of  nodes  sends  values  of  the  input  pattern  to  the 
links  feeding  the  second  layer.  The  second  layer  of  nodes  uses  nonlinear  threshold  logic  ele¬ 
ments10  to  sum  weighted  values  of  the  inputs  and  add  internal  offsets.  Output  values  from  the 
second  layer  are 

N-l 

yj  =  f(cj+  2  (8a) 

i  =  0 


where 


f(«)  = 


a  if  a  >  0 
0  if  a  ^  0  ’ 


(8b) 


In  these  equations,  f(o;)  is  a  nonlinear  function  that  models  the  nonlinearity  inherent  in  a  biologi¬ 
cal  neuront,  Cj  is  an  internal  offset  associated  with  each  threshold  logic  node,  and  W;:  are  positive 
or  negative  weights  associated  with  the  links.  The  internal  offsets  and  weights  are  selected  differ¬ 
ently  for  the  three  different  situations  described  in  the  previous  section.  A  binary  symmetric 


t  Biological  neurons  saturate  for  large  enough  a.  In  this  discussion  we  are  interested  in  the 
monotonically  increasing  portion  of  the  input-output  characteristic. 
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Figure  3.  Feed-forward  perceptron  neural  net  used  to  calculate  values  related  to  the  likelihood  of  each  of  M  pattern 
classes  for  patterns  with  N  elements.  All  nodes  are  analog-threshold  logic  nodes  with  internal  thresholds  set  to  zero. 
Weights  depend  on  the  stored  states. 


channel  with  p  =  €  requires  the  weights  and  offset  in  Equation  (5)  if  elements  of  the  input  vector 
take  on  the  values  +1  and  -1,  and  the  weights  and  offsets  in  Equation  (6)  if  elements  take  on  the 
values  0  and  1.  In  the  more  general  case  when  p  ^  €,  and  the  inputs  take  on  the  values  0  and  1, 
the  weights  and  offsets  are  as  in  Equation  (7).  The  resultant  output  values  yj  are  N  -  N*  for 
the  binary  symmetric  channel,  and  Lj  for  the  general  case. 


A  number  of  different  nets  can  be  used  to  pick  the  maximum  value  from  the  yj  outputs  of 
the  perceptron  net.  In  situations  where  it  is  only  important  to  know  when  the  input  matches  a 
stored  state  very  closely,  it  is  sufficient  to  identify  those  second-level  nodes  in  Figure  3  with  out¬ 
put  values  that  exceed  a  specified  threshold.  This  can  be  performed  by  modifying  the  constant 
added  in  (8a)  such  that  only  the  output  of  those  nodes  corresponding  to  closely  matching  stored 
states  are  positive.  For  example,  for  the  binary  symmetric  channel  with  +1  and  -1  inputs,  if  c}  in 


N  ... 

(8a)  is  changed  to  A  then  only  nodes  corresponding  to  exemplars  with  a  Hamming  distance 


less  than  A  from  the  input  will  have  positive  outputs. 


In  the  more  general  situation,  a  net  must  select  the  maximum  over  the  M  yj  values.  We  have 
developed  three  topologically  different  neural  net  structures  which  perform  this  task.  These  nets 
maintain  the  highly  parallel  structure  necessary  to  achieve  the  theoretical  computation  speed-up 
provided  by  multiple  processors.  They  could  thus  be  used  in  a  larger  system  when  their  outputs 
feed  other  nets  without  compromising  the  overall  computational  speed  of  the  system.  One  feed¬ 
forward  net  uses  a  brute  force  approach  to  perform  binary  comparisons  between  all  input  values. 
A  second  feed-forward  net  uses  a  binary  tree  to  reduce  the  number  of  nodes  required.  Finally,  a 
third  fully-connected  net  sometimes  called  a  “winner-take-all”  net  uses  strong  inhibition  between 
nodes  and  iterates  until  the  maximum  is  found. 
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A  brute  force  feed-forward  net  that  picks  the  maximum  from  eight  inputs  is  presented  in 
Figure  4.  The  inputs  labelled  y0  through  y7  are  on  the  bottom  of  the  net,  and  the  outputs 
labelled  z0  through  z7  are  along  the  left  diagonal.  This  network  is  designed  such  that  only  the 
output  corresponding  to  the  maximum  input  will  be  positive.  The  Filled  circles  in  this  net  repre¬ 
sent  hard-limit  nodes  that  compare  the  values  of  all  inputs.  Hard  limit  nodes  are  similar  to 
threshold  logic  nodes  (Equations  8a  and  8b)  except  the  function  f  is  defined  by 


f(a)  = 


1  if  a  >  0 
0  if  a  sS  0. 


(9a) 


These  nodes  are  simpler  to  implement  than  threshold-logic  nodes  because  linearity  above  thresh¬ 
old  is  not  required. 


The  outputs,  Zj,  in  Figure  4  are 

zr{[X  f(yj-y.)-  X  f(y,-yj)+j-(M- '•5)J  .  (9b) 

i >  J  i<j 


where  f  is  as  in  Equation  (9a)  and  0^  i  ,  j  <  M. 

Hard  limit  nodes  in  Figure  4  perform  binary  comparisons  required  in  Equation  (9b)  between 
all  inputs.  Internal  thresholds  in  the  output  nodes  and  weights  are  set  such  that  an  output  node 
is  positive  only  when  the  results  of  all  binary  comparisons  with  the  associated  input  indicate  that 
that  input  is  greatest.  A  limitation  of  this  net  is  that  it  becomes  very  large  for  large  M  because  it 
requires  0(M2)  nodes  to  pick  the  maximum  of  M  inputs.  For  example,  5050  nodes  are  required 
to  pick  the  maximum  of  100  inputs. 


A  neural  net  that  is  more  efficient  for  large  numbers  of  inputs  can  be  built  using  the  two- 
input  comparator  subnet  presented  in  Figure  5.  This  subnet  produces  logical  outputs  (z0,  Zj)  indi¬ 
cating  which  input  was  maximum  and  also  one  analog  output  (max)  equal  to  the  maximum 
value.  Whenever  the  inputs  (y0  and  yj)  differ,  only  the  logical  output  corresponding  to  the  max¬ 
imum  input  will  be  nonzero,  and  the  value  of  max  will  be  that  of  the  maximum  input.  The  com¬ 
parator  subnet  uses  threshold  logic  nodes  represented  by  open  circles  in  Figure  5,  and  hard-limit 
nodes  represented  by  filled  circles.  In  addition,  all  internal  offsets  (cj  in  Equation  (8a))  in  Figure 
5  are  zero. 


A  network  that  picks  the  maximum  of  M  inputs  can  be  built  using  comparator  subnets 
arranged  in  a  layered  binary  tree.  Such  a  net  includes  M-l  comparator  subnets  arranged  in  roughly 
log2M  layers  when  the  maximum  of  M  inputs  must  be  selected.  For  example,  it  requires  only  594 
nodes  to  pick  the  maximum  from  100  inputs.  An  example  of  such  a  net  that  picks  the  maximum 
of  eight  inputs  is  presented  in  Figure  6.  The  inputs  in  this  net  are  at  the  bottom,  and  the  outputs 
are  at  the  top.  Four  comparator  subnets  in  the  first  layer,  two  comparator  subnets  in  the  second 
layer,  and  one  partial  subnet  in  the  third  layer  from  the  bottom  of  this  net  are  used  to  find  the 
maximum  input.  The  maximum  value  is  fed  forward  from  the  input  through  the  threshold-logic 
nodes  (open  circles)  to  the  final  partial  subnet.  The  decisions  concerning  which  input  was  maxi¬ 
mum  are  fed  forward  from  the  input  through  the  hard-limit  nodes  (filled  circles)  to  the  output 
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Figure  4.  Feed-forward  neural  net  that  determines  which  of  eight  inputs  is  maximum  by  explicitly  performing  all  binary 
comparisons.  Nodes  denoted  by  filled  circles  are  hard-limit  nodes  and  nodes  denoted  by  open  circles  are  analog-threshold 

logic  nodes .  Internal  thresholds  on  all  nodes  except  the  output  nodes  are  zero.  Internal  thresholds  on  nodes  zQ .  zt . z6,z7 

are  -6.5,  - 5.5 . -0.5, +0.5,  respectively.  Weights  are  either  +/  (open  arrows)  or  -l  (filled  arrows). 
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Figure  5.  Comparator  subnet  that  selects  the  maximum  of  two  inputs.  Internal  thresholds  on  both  hard-limit  nodes  (filled 
circles)  and  analog-threshold  logic  nodes  (open  circles)  are  zero. 


OUTPUT 

z0  Z1  z2  z3  z4  z5  z6  z7 


yo  yi  Y2  V3  Y4  Yb  V6  V7 

INPUT 


Figure  6.  Feed-forward  neural  net  that  determines  which  of  eight  inputs  is  maximum  using  a  binary  tree  and  compara¬ 
tor  subnets  from  Figure  5.  Internal  thresholds  on  both  hard-limit  nodes  (filled  circles)  and  analog-threshold  logic  nodes 
(open  circles)  are  zero  except  for  the  output  nodes.  Internal  thresholds  on  nodes  z0,  zI....,z6,z7  are  -2.5.  Weights  for  all 
comparator  subnets  in  this  net  are  as  in  Figure  5.  All  other  weights  are  +L 
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nodes.  After  the  input  propagates  to  the  output,  only  that  output  corresponding  to  the  maximum 
input  will  be  high. 

The  above  two  nets  use  strictly  feed-forward  connections  and  are  relatively  large.  A  less 
complex  net  that  uses  feedback  connections  and  will  be  referred  to  as  a  maxnet,  is  presented  in 
Figure  7.  This  net  is  motivated  by  the  large  numbers  of  connections  in  biological  neural  nets  and 
by  laterally  interconnected  networks  described  by  Kohonen12.  Although  this  net  is  similar  in 
structure  to  the  Hopfield  net1,  it  uses  threshold-logic  nodes  instead  of  hard-limit  nodes  and  feeds 
the  output  of  each  node  back  to  its  input  instead  of  disallowing  this  feedback  path.  The  maxnet 
is  a  fully  connected  net  made  up  of  only  M  threshold  logic  nodes  with  internal  thresholds  set  to 
zero.  Input  values  are  applied  at  time  zero  through  the  input  nodes  on  the  bottom  of  Figure  7. 
This  initializes  node  outputs  for  each  node  at  time  zero  (n}(0))  to  the  input  values: 

Mj(0)  =  yj,  j  =  0,1,...M-2,M-1  (10a) 


OUTPUT 

z0  Z1  zM-2  ZM-1 


Figure  7.  Iterative  neural  net  denoted  “maxnet”  that  determines  which  of  M  inputs  is  the  maximum.  The  inputs  are 
applied  prior  to  time  zero  and  then  removed,  and  the  outputs  are  valid  after  the  netw  ork  converges .  All  nodes  are  analog- 
threshold  logic  nodes  with  internal  thresholds  equal  to  zero.  Each  node  connects  to  itself  and  all  other  nodes.  Weights  are 
-w  (solid  arrows)  or  + 1  (open  arrows). 


The  network  then  iterates  to  find  the  maximum  via  the  following  equation: 

Mj(t+l)  =  f[Mj(t)-  J  wijMi(t)]  (10b) 

i*j 

In  this  equation  f  is  the  threshold  logic  function  described  in  Equation  (8b)  and  w,j  is  the  inhibi¬ 
tory  weight  between  nodes.  Each  node  inhibits  all  other  nodes  with  a  value  equal  to  the  node’s 
output  multiplied  by  a  small  negative  weight.  Each  node  also  feeds  back  to  itself  with  unity  gain. 
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After  convergence,  only  that  output  node  corresponding  to  the  maximum  input  will  have  a  non¬ 
zero  value.  This  value  will,  in  general,  be  less  than  the  original,  time  zero,  value  of  that  node. 

The  output  values  of  the  net  are  thus  simply  the  node  output  values  after  convergence: 

Zj  =  /ij(oo),  j  =  0,l,...M— 2,M— 1.  (10c) 

The  maxnet  net  will  converge  and  find  the  maximum  input  when 


Wjj  =  w  < 


1 

M-l 


(lOd) 


where,  by  convergence  we  mean  the  node  outputs  stop  changing  and  only  the  output  of  one  node 
corresponding  to  the  maximum  input  is  positive. 

The  proof  of  convergence  depends  primarily  on  the  fact  that  the  inhibition  to  the  node  con¬ 
taining  the  maximum  value  is  always  less  than  the  inhibition  to  other  nodes,  and  that  at  conver¬ 
gence  the  inhibition  to  the  node  with  the  maximum  value  reduces  to  zero.  Denote  the  total  inhi¬ 
bition  in  Equation  (10b)  from  all  other  nodes  to  node  j  as  i n h i b j  where: 

inhibj(t)  =  £  wjjMj(t)  (11) 

i^j 

If  node  j*  corresponds  to  the  maximum  input,  then  on  the  first  iteration  inhibj*(l)  will  be  less 
than  the  inhibition  to  all  other  nodes.  This  follows  because  all  node  outputs  are  positive  and  the 
sum  of  all  outputs,  excluding  one  in  Equation  (11),  will  be  minimum  when  the  maximum  is 
excluded.  Node  j*  will  thus  remain  the  maximum  after  the  first  iteration.  By  induction,  it  will 
remain  the  maximum  over  all  iterations. 

The  remainder  of  the  proof  of  convergence  depends  on  demonstrating  that  the  output  of 
node  j*  is  never  driven  to  zero  but  the  outputs  of  all  other  nodes  are.  When  Equation  (lOd) 
is  satisfied,  inhibit)  is  always  less  than  the  average  value  of  all  other  node  outputs.  The  inhibi¬ 
tion  to  node  j*  will  thus  be  less  than  the  average  of  the  output  of  all  nodes.  Whenever  a  maxi¬ 
mum  exists,  this  inhibition  will  always  be  less  than  the  current  output  of  node  j*  because  the 
maximum  of  a  set  of  positive  numbers  is  always  greater  than  the  average.  The  output  of  node  j* 
will  thus  not  be  driven  to  zero  while  any  other  nodes  have  non-zero  outputs.  After  all  other 
node  outputs  are  driven  to  zero,  the  inhibition  to  node  j*  drops  to  zero,  and  the  output  of  j* 
remains  constant.  The  outputs  of  all  other  nodes  will  always  be  driven  to  zero  because  the 
inhibition  to  these  nodes  remains  positive  on  all  iterations  and  approaches  a  positive  constant  as 
time  increases.  In  practice,  the  maxnet  will  still  converge  and  find  the  maximum  when  each 

weight  wjj  is  set  to  —  j  plus  a  small  random  component.  This  forces  the  net  to  find  a  maximum 

when  the  inputs  to  all  nodes  are  identical. 

The  behavior  of  the  maxnet  is  illustrated  in  Figure  8.  This  figure  presents  the  node  outputs 
for  a  10-node  maxnet  at  iterations  0  through  3  when  the  net  converged.  The  initial  values  to  the 
net  come  from  the  perceptron  likelihood  calculation  net  presented  in  Figure  3  when  the  number 
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Figure  8.  Node  outputs  in  a  10-node  maxnet  at  iterations  zero  through  3  when  the  net  converged.  Initial  values  conte 
from  the  perceptron  net  presented  in  Figure  3  when  the  number  of  classes  (A 1)  is  10,  the  number  of  input  nodes  (N)  is  100, 
and  the  input  to  the  perceptron  net  is  the  exemplar  pattern  for  class  number  5. 
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of  classes  (M)  is  10,  the  number  of  input  nodes  (N)  is  100,  and  the  input  to  the  classifier  is  the 
exemplar  pattern  for  class  number  5.  Exemplars  were  selected  by  randomly  setting  each  element 
in  every  exemplar  to  +1  or  -1  with  equal  probability.  The  output  values  from  the  maxnet  nodes 
range  from  zero  to  100  which  is  the  number  of  input  nodes  in  the  classifier.  At  time  zero,  node  5 
has  a  value  of  100  because  the  input  pattern  exactly  matches  the  exemplar  for  class  5.  All  other 


.  N 

nodes  have  values  from  a  binomial  distribution  with  a  mean  of  —  =  50  and  a  standard  deviation 

2 


of  \/-25N  =  5.  This  occurs  because  the  Hamming  distance  has  a  binomial  distribution  when 
exemplars  are  selected  at  random  as  described  above.  After  the  first  iteration,  output  values  are 
reduced  by  the  average  of  all  nodes  at  time  zero  and  the  outputs  of  six  nodes  remain  positive. 
Output  values  are  then  reduced  further  on  the  second  iteration  where  only  three  outputs  remain 
positive.  Finally,  only  the  maximum  output  remains  positive  after  the  third  iteration. 


A  similar  example  is  presented  in  Figure  9  for  a  maxnet  with  100  instead  of  10  nodes  where 
the  number  of  input  nodes  to  the  perceptron  likelihood  calculation  net  is  1000.  As  can  be  seen, 
the  number  of  iterations  required  for  convergence  increases  only  slightly  from  3  to  9.  This 
increase  is  primarily  caused  by  the  greater  number  of  nodes  with  values  at  the  high  end  of  the 
binomial  distribution  when  there  are  100  nodes. 


Convergence  is  slower  when  the  peak  value  across  nodes  in  the  maxnet  is  less  distinct.  This 
is  illustrated  in  Figure  10.  This  figure  is  similar  to  Figure  9  except  the  input  to  the  perceptron 
net  was  passed  through  a  noisy  binary  symmetric  channel  where  the  probability  of  changing  a  bit 
(p)  was  set  to  0.4.  The  initial  value  of  node  50  is  roughly  600  because  only  roughly  20%  of  the 
bits  in  the  input  match  the  examplar  for  class  50.  Other  nodes  still  have  a  binomial  distribution 
with  mean  500  and  standard  deviation  of  \J. 25N  =  15.8.  As  can  be  seen,  the  network  converges 
in  27  iterations.  Similar  experiments  were  performed  when  the  number  of  classes  (M)  in  the  clas¬ 
sifier  ranged  from  2  to  100,  the  number  of  elements  in  the  patterns  (N)  was  set  to  10M,  and  the 
probability  of  error  in  the  binary  symmetric  channel  (p)  ranged  from  0.0  to  0.5.  Results  are  pre¬ 
sented  in  Figure  11. 

Figure  11  presents  the  average  number  of  iterations  until  convergence  for  the  maxnet  of  Fig¬ 
ure  7  versus  the  probability  of  error  in  the  binary  symmetric  channel  for  the  above  cases.  It  was 
obtained  from  Monte  Carlo  experiments  examining  100  different  runs  per  point.  As  can  be  seen, 
the  average  number  of  iterations  required  for  convergence  is  small  (<10)  for  as  many  as  100 
classes  when  there  is  a  well-defined  peak  and  the  probability  of  error  is  less  than  0.1.  The  aver¬ 
age  number  of  iterations  also  does  not  grow  strongly  with  the  number  of  nodes  in  the  maxnet. 
The  average  number  of  iterations  required  for  convergence  rises  gradually  to  be  less  than  20 
when  the  probability  of  error  is  0.3.  Above  this  level,  the  average  number  of  iterations  rises  to  a 
value  that  is  roughly  10%  above  M  when  the  probability  of  error  is  0.5.  These  results  demon¬ 
strate  the  utility  and  robustness  of  the  maxnet.  The  net  always  converges  and  finds  the  node  with 
the  maximum  value,  and  the  number  of  iterations  required  for  convergence  does  not  grow 
rapidly  as  the  number  of  classes  increases.  Furthermore,  the  number  of  iterations  increases  only 
when  the  error  rate  of  the  classifier  is  large  and  the  utility  of  the  classifier  itself  is  questionable. 
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NODE  NUMBER  NODE  NUMBER 

Figure  9.  Node  outputs  in  a  100-node  maxnet  at  iterations  zero  through  9  when  the  net  converged .  Initial  values  come 
from  the  perceptron  net  presented  in  Figure  3  w  hen  the  number  of  classes  (M)  is  100,  the  number  of  input  nodes  (N)  is 
1000,  and  the  input  to  the  classifier  is  the  exemplar  pattern  for  class  number  50. 
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Figure  10.  Node  outputs  in  the  100-node  maxnet  of  Figure  9  w  hen  the  input  to  the  classifier  is  the  exemplar  pattern  for 
class  number  50  after  being  passed  through  a  memory  less  binary-symmetric  channel  with  an  error  probability ,  p,  of  0.4. 
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Figure  //.  Average  number  of  iterations  until  convergence  for  the  maxnet  of  Figure  7.  Initial  values  come  from  the 
perceptron  net  presented  in  Figure  3  when  the  number  of  classes  (M)  ranges  from  2  to  100,  the  number  of  input  nodes  (N) 
is  10 M,  and  the  input  is  the  exemplar  for  one  pattern  after  being  passed  through  a  binary-symmetric  channel  with  the 
specified  probability  of  error. 


A  comparison  of  the  three  different  types  of  nets  described  above  for  picking  the  maximum 
is  presented  in  Table  I.  This  table  illustrates  that  the  maxnet  in  Figure  7  requires  the  fewest 
nodes  to  pick  the  maximum  value.  The  brute  force  feed-forward  net  in  Figure  4  becomes  intract¬ 
able  for  large  numbers  of  inputs  because  the  number  of  nodes  in  this  net  grows  0(N2).  The 
binary  tree  net  in  Figure  6  is  more  reasonable;  however,  it  still  requires  roughly  six  times  the 
number  of  nodes  used  in  the  maxnet.  The  maxnet  should  thus  be  preferred  whenever  the  decid¬ 
ing  factor  is  the  number  of  nodes  required  and  the  slight  delay  added  by  the  need  to  iterate  is 
acceptable.  For  large  numbers  of  inputs,  (M  >  10)  the  number  of  interconnects  required  is 
always  greatest  for  the  brute  force  net  and  the  maxnet  and  least  for  the  binary  tree.  The  binary 
tree  may  thus  be  preferred  when  the  deciding  factor  is  the  number  of  interconnects.  In  the  fol¬ 
lowing,  we  use  the  maxnet. 
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TABLE  1 

Comparison  of  Three  Neural  Nets  That  Pick  The  Maximum  of  M  Inputs 
(Node  Counts  Do  Not  Include  Input  Or  Output  Nodes) 

Net 

Structure 

Types  of 
Nodes 

Number  of  Nodes  Required 

Required 

M  =  10 

M  =  100 

M  =  1000 

Brute  Force 

Feed-Forward 

Hard-Limit 

55 

5,050 

500,500 

Binary  Tree 

Feed-Forward 

Hard-Limit 

and 

Threshold-Logic 

54 

594 

5,994 

Maxnet 

Fully-Connected, 
Iterate  to 
Convergence 

Threshold-Logic 

10 

100 

1,000 

A  block  diagram  of  a  complete  neural  net  classifier  made  up  of  a  perceptron  likelihood  cal¬ 
culator  and  a  maxnet  is  presented  in  Figure  12.  This  complete  classifier  will  be  referred  to  as  a 
Hamming  net.  The  Hamming  net  is  an  efficient  optimum  classifier  made  up  of  only  threshold- 
logic  nodes  and  interconnects.  It  is  operated  by  presenting  a  binary  pattern  at  the  input  for  at 
least  the  time  it  takes  the  input  to  propagate  to  the  maxnet  nodes,  by  removing  the  input,  and 
then  by  waiting  until  the  maxnet  converges.  After  convergence,  only  the  output  node  correspond¬ 
ing  to  the  selected  class  will  be  positive. 


OUTPUT 

z0  Z1  ZM  2  ZM-1 


Figure  12.  Complete  optimum  neural-net  classifier  referred  to  as  a  Hamming  net  made  up  of  a  perceptron  to  calculate 
likelihoods,  and  a  maxnet  to  select  the  node  with  the  maximum  likelihood. 
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4.  THE  HOPFIELD  CLASSIFIER 


A  highly  connected  neural  net  was  described  by  Hopfield  in  References  1  to  3  that  can  be 
used  as  an  associative  memory.  The  net  described  in1  that  uses  hard-limit  nodes  is  presented  in 
Figure  13.  Input  values  are  applied  at  time  zero  to  these  nodes  through  the  bottom  threshold- 
logic  nodes.  This  initializes  node  outputs  for  each  node  at  time  zero  (/u,(0))  to  the  input  values 

Mi(0)  =  xj,  i  =  0,l,...N-2,N-l  .  (12a) 

The  network  then  iterates  using  the  following  equation: 


Mj(t+D  =  f(  ^  tijMiWj 


(12b) 


In  this  equation  f  is  a  modified  hard-limit  function  and  tjj  is  weight  applied  to  the  output  of 
node  i  that  feeds  to  node  j.  If  we  assume  the  elements  of  the  input  vector  x  take  on  values  +1 
and  -1,  respectively,  for  the  +1  and  -1  states,  then  f  is  the  symmetric  hard-limit  function 


f(a)  = 


1  if  cv  >  0 
-1  if  a  0 


(12c) 


and  the  weights  are  specified  by 


M  1 

‘u-  2 

s  =  0 


vs  Ys 
xixj 


i^j 


(13a) 


and 


tjj  =  0,  i  =  j  ,  (13b) 

where  x|  is  element  i  of  the  exemplar  for  pattern  j.  The  output  of  each  node  is  fed  to  every  other 
node  with  a  weight  that  is  symmetric,  and  each  node  does  not  feedback  to  itself.  After  conver¬ 
gence,  the  output  of  the  net  is  the  final  pattern  represented  by  the  outputs  of  the  nodes 

x;  =  Mi(oo),  i  =  0,l,...N-2,N-l  .  (14) 

Hopfield1  first  demonstrated  that  when  this  net  is  trained  with  M  exemplar  patterns  using 
Equation  (13),  and  an  exemplar  is  presented  at  time  zero,  then  the  final  pattern  in  the  net  after 
convergence  will  be  one  of  the  exemplars  with  high  probability  if 

MC.15N  .  (15) 

The  exemplars  thus  form  stable  states  of  the  net.  Hopfield's  statistical  results  were  obtained  with 
randomly  generated  exemplars.  It  is  possible  and  relatively  easy  to  select  a  set  of  M  exemplars 
that  satisfy  Equation  (15)  but  do  not  form  stable  states  in  the  Hopfield  net.  These  exemplars 
must  have  many  elements  in  common.  When  an  exemplar  for  one  of  these  patterns  is  presented 
at  time  zero,  the  network  doesn’t  converge  to  any  of  the  trained  exemplars.  Instead  it  converges 
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OUTPUT 


X0  X1  xN-2  XN-1 

INPUT 


Figure  13.  Iterative  Hopfield  neural  net.  The  inputs  are  applied  prior  to  time  zero  and  then  removed ,  and  the  outputs  are 
valid  after  the  netw  ork  converges.  Nodes  denoted  by  filled  circles  are  symmetric  hard-limit  nodes.  Internal  thresholds  in  all 
nodes  are  set  to  zero.  Each  node  in  the  middle  row 1  connects  to  all  other  nodes  but  not  to  itself.  Weights  are  specified  b  v  t 
(solid  arrows)  or  are  +/  (open  arrows). 


to  a  spurious  pattern  never  seen  before.  This  problem  of  spurious  states  also  occurs  when  a  noisy 
exemplar  is  presented  to  the  net.  Even  when  the  M  exemplars  are  stable  states  of  the  net,  there  is 
no  guarantee  that  noisy  versions  of  these  exemplars  passed  through  discrete  memoryless  channels 
and  presented  at  time  zero  will  converge  to  the  original  exemplar.  Hopfield1,  for  example, 
observed  that  the  number  of  spurious  states  found  increases  substantially  as  more  and  more  ele¬ 
ments  in  the  input  exemplar  are  corrupted. 

The  Hopfield  neural  net  can  be  used  as  a  classifier  only  when:  (1)  the  exemplars  for  the 
patterns  to  be  classified  form  stable  states  and  converge  to  themselves  when  presented  at  time 
zero  as  input,  and  (2)  a  mechanism  is  provided  to  determine  which  of  the  M  exemplars  the  net  is 
closest  to  after  convergence.  The  first  requirement  is  a  necessary  condition  for  proper  operation. 
The  second  is  necessary  because  the  Hopfield  net  by  itself  is  not  a  neural-net  classifier.  It  is  more 
like  a  preprocessor  which  still  requires  a  classification  net  to  select  which  of  M  exemplars  a  patt¬ 
ern  is  closest  to. 

It  is  difficult  to  satisfy  the  requirement  that  exemplars  form  stable  states  without  actually 
running  the  Hopfield  net.  In  general,  patterns  that  are  more  random  will  satisfy  this  requirement 
more  easily  than  patterns  with  many  bits  in  common.  We  have  satisfied  this  requirement  by 
selecting  patterns  carefully  using  trial  and  error  procedures,  by  using  randomly  generated  patterns 
where  the  Hamming  distances  between  all  patterns  were  within  certain  bounds,  and  by  using  an 
orthogonalization  technique  described  in6. 
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The  orthogonahzation  technique  involves  generating  patterns  b  that  are  orthogonal  to  the 
exemplars  xs  unless  k  =  s.  The  weights  in  the  net  are  then  given  by 

M-I 

Hr  2xibJ  >  i5*j  .  (16a) 

s  =  0 


and 


tjj  =  0,  i  =  j  .  (16b) 

Equation  (16)  then  replaces  Equation  (13)  as  the  recipe  for  determining  weights,  and  the 
operation  of  the  net  is  otherwise  the  same  as  without  orthogonalization.  The  bk  patterns  are 
found  from 

B  =  XT1  .  (17a) 

In  this  equation  B  is  an  N  by  M  array  where  each  row  is  the  orthogonal  pattern  b‘,  X  is  an  N 
by  M  array  where  each  row  is  the  exemplar  x1,  and  C  is  an  M  by  M  correlation  matrix  for  the 
exemplars: 

N-I 

Cij  =  2XkXk  '  (l7b) 

k  =  0 

The  inverse  in  Equation  (17a)  will  exist  provided  the  xs  exemplars  are  linearly  independent. 

When  the  exemplars  are  not  linearly  dependent,  the  more  general  Moore-Penrose  matrix  inver¬ 
sion  technique13  can  be  used. 

The  second  requirement  for  using  the  Hopfield  net  as  a  classifier  can  be  satisfied  in  two 
ways.  First,  if  a  “no-match”  output  is  allowed  for  spurious  states,  the  final  pattern  in  the  net 
need  only  be  compared  to  each  of  the  exemplars.  In  this  case,  a  perfect  match  results  in  an  out¬ 
put,  otherwise  the  ouput  indicates  a  “no-match”  condition.  A  net  that  performs  this  type  of  class¬ 
ification  is  presented  in  Figure  14.  Comparisons  between  the  final  state  of  the  Hopfield  net  and 
exemplars  are  performed  using  a  perceptron  with  M  nodes.  The  weights  in  the  perceptron  from 
internal  node  xj  to  node  Zj  are  simply  the  elements  from  exemplar  j: 

Wjj  =  xj  (18) 

Internal  offsets  in  the  output  nodes  are  set  to  6-N  where  €  <  1.  The  output  of  output  node  j  will 
thus  be  non-zero  only  when  the  final  pattern  in  the  Hopfield  net  exactly  matches  the  exemplar 
for  pattern  j. 

Another  technique  for  using  the  Hopfield  net  in  a  classifier  eliminates  the  possibility  of  a 
“no-match”  output.  Here,  the  Hamming  distance  between  the  final  pattern  in  the  net  and  exem¬ 
plars  for  all  patterns  is  computed  and  the  pattern  with  the  minimum  Hamming  distance  is 
selected.  This  can  be  performed  with  the  Hamming  net  classifier  presented  in  Figure  12  when  the 
input  to  the  Hamming  net  is  the  final  state  or  output  of  the  Hopfield  net.  A  classifier  that  uses 
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OUTPUT 


Figure  14.  Complete  Hopfield  neural-net  classifier  made  up  from  a  Hopfield  net  followed  by  a  perceptron.  The  Hopfield 
net  is  as  in  Figure  13.  The  perceptron  is  designed  as  in  Figure  3  except  all  nodes  are  hard-limit  nodes  and  the  internal 
thresholds  in  the  final  output  nodes  are  set  to  €-N  instead  of  zero,  where  €</. 
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this  technique  will  be  called  a  Hopfield-Hamming  classifier.  The  Hopfield-Hamming  classifier 
uses  the  Hopfield  net  as  a  preprocessor  for  the  Hamming  net.  Since  the  Hamming  net  is  optimal, 
the  Hopfield-Hamming  net  is  generally  suboptimal  and  always  requires  more  nodes  than  the 
Hamming  net. 

Table  II  compares  the  two  classifiers  that  use  the  binary  Hopfield  net  to  the  optimal  Ham¬ 
ming  net  classifier.  As  can  be  seen,  the  two  classifiers  that  use  Hopfield  nets  require  more  nodes 
and  many  more  interconnects  than  the  optimal  Hamming  net  for  the  normal  situation  when  the 
number  of  classes  is  less  than  the  number  of  elements  in  each  input  pattern.  For  the  examples  in 
the  table,  classifiers  using  the  Hopfield  net  require  almost  an  order-of-magnitude  more  intercon¬ 
nects  than  the  Hamming  net  and  more  nodes  than  the  Hamming  net.  This  is  because  the  Hop- 
field  net  requires  0(N2)  interconnects  while  the  Hamming  net  requires  only  0(M2)  interconnects 
and  in  the  examples  N  =  10M.  In  general,  classifiers  using  the  Hopfield  net  always  require  nearly 
an  order  of  magnitude  more  interconnects  than  the  Hamming  net  because  Hopfield1  requires 
M  <.  15N.  When  orthogonalization  is  used  as  in6  and  N  =  M,  the  simpler  Hopfield  classifier  is 
roughly  equivalent  in  complexity  to  the  Hamming  net  while  the  more  complex  Hopfield- 
Hamming  net  requires  more  nodes  and  almost  twice  as  many  interconnects  as  the  Hamming  net. 
The  more  complex  Hopfield-Hamming  net  is  probably  to  be  preferred  over  the  simpler  Hopfield 
classifier,  however,  because  it  does  not  allow  a  “no-match”  output  to  occur  when  the  Hopfield 
net  converges  to  a  spurious  state. 


TABLE  II 

Comparison  of  Three  Different  Neural  Nets  That  Can  Be  Used  To  Classify  M 
Binary  Patterns  When  Each  Pattern  Has  N  Elements 
(Counts  Of  Numbers  Of  Nodes  And  Interconnects  Do  Not  Include  Input  And  Output 
Nodes  Or  Interconnects  Between  Major  Internal  Subnets) 

Net 

Description 

Nodes 

Interconnects 

Nodes/Interconnects 
M=10,  N=1 00 

Hopfield 

Hopfield  Net 
Followed  By 
Perceptron 

2N+M 

N2+N(M-1) 

210/10,900 

Hopfield- 

Hamming 

Hopfield  Net 
Followed  By 
Hamming  Net 

2N+2M 

N2+M2+N(M-1) 

220/11,000 

Hamming 

Perceptron 
Followed  By 
Maxnet 

N+2M 

m2+nm 

120/1,100 

25 


77677-15 


5.  COMPARISONS  BETWEEN  HOPFIELD  CLASSIFIERS 
AND  THE  HAMMING  NET 


Two  experiments  were  performed  using  8  patterns  with  120  elements  each,  to  compare  the 
behavior  of  a  Hamming  net  classifier,  a  Hopfield-Hamming  classifier,  and  a  Hopfield  classifier. 
The  first  experiment  used  the  set  of  visually  recognizable  handcrafted  digit  patterns  shown  in 
Figure  15.  These  patterns  represent  seven  digits  plus  one  block  pattern  selected  because  it  had  a 
large  Hamming  distance  to  the  others.  The  second  experiment,  used  random  patterns  where  each 
element  was  +1  or  -1  with  a  probability  of  0.5.  In  this  experiment  a  pattern  was  used  if,  and 
only  if,  its  Hamming  distance  from  the  already  accepted  exemplars  fell  in  the  range  55  to  65. 
Tables  III  and  IV  list  the  Hamming  distances  for  the  8  exemplars  (sO  through  s7)  used  in  the 
two  experiments.  All  exemplar  patterns  were  stable;  i.e.,  any  exemplar  given  as  the  input  to  a 
network  did  not  change  as  a  result  of  iterating.  In  both  experiments  each  of  the  8  exemplars  was 
randomly  perturbed  40  times  at  a  specified  error  rate  using  a  binary  symmetric  channel  and  ap¬ 
plied  as  the  input  to  each  of  the  three  classifiers.  This  provided  320  runs  per  data  point. 

The  results  of  the  first  experiment  using  digit  patterns  are  presented  in  Figure  16a.  Percent 
correct  in  this  and  other  figures  was  computed  by  counting  a  trial  as  correct  only  when  the  cor¬ 
rect  input  pattern  was  selected.  It  can  be  seen  that  the  Hopfield  net  fails  immediately  at  an  error 
rate  of  .025  and  that  the  Hopfield-Hamming  net  begins  to  fail  at  the  error  rate  of  .150,  whereas 
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Figure  15.  Digit  patterns  used  in  Experiment  No.  1. 
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TABLE  III 

Hamming  Distances  Between  Digit  Patterns  Used  In  Experiment  No.  1 

sO 

si 

s2 

s3 

s4 

s5 

s6 

s7 

sO 

0 

48 

64 

58 

48 

74 

58 

74 

si 

48 

0 

54 

50 

78 

54 

54 

54 

s2 

64 

54 

0 

36 

56 

30 

60 

66 

s3 

58 

50 

36 

0 

36 

60 

62 

36 

s4 

48 

78 

56 

36 

0 

66 

48 

52 

s5 

74 

54 

30 

60 

66 

0 

42 

84 

s6 

58 

54 

60 

62 

48 

42 

0 

72 

s7 

74 

54 

66 

36 

52 

84 

72 

0 

TABLE  IV 

Hamming  Distances  Between  Random  Patterns  Used  In  Experiment  No.  2 

sO 

si 

s2 

s3 

s4 

s5 

s6 

s7 

sO 

0 

62 

59 

61 

57 

59 

64 

63 

si 

62 

0 

65 

57 

63 

61 

64 

59 

s2 

59 

65 

0 

62 

58 

58 

55 

58 

s3 

61 

57 

62 

0 

60 

58 

57 

58 

s4 

57 

63 

58 

60 

0 

56 

65 

56 

s5 

59 

61 

58 

58 

56 

0 

65 

62 

s6 

64 

64 

55 

57 

65 

65 

0 

61 

s7 

63 

59 

58 

58 

56 

62 

61 

0 
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a)  DIGIT  PATTERNS  b)  RANDOM  PATTERNS 
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Figure  16,  Results  of  Experiment  No .  /  obtained  using  digit  patterns  (a)  and  random  patterns  (b)  and  a  Hopfield  net 
without  orthogonahzation. 


the  Hamming  net  is  always  correct  until  an  error  rate  of  .300.  The  results  of  the  second  experi¬ 
ment  using  random  patterns  are  presented  in  Figure  16b.  Here,  the  Hamming  net  begins  failing 
at  an  error  rate  of  about  .300  as  before,  but  the  Hopfield-Hamming  net  survives  until  an  error 
rate  of  .250  as  opposed  to  .150,  and  the  Hopfield  net  fails  at  .250  instead  of  .025,  a  marked 
improvement.  Clearly,  the  Hopfield  net  is  sensitive  to  a  “well-behaved”  set  of  Hamming  distances 
among  the  stored  states.  In  general,  the  straight  Hamming  net  yields  the  best  overall  results. 

Hopfield1  suggested  that  the  Hopfield  net  could  be  used  as  a  bibliographic  retrieval  system. 

A  third  experiment  was  performed  to  test  this  conjecture.  A  network  of  1 1  exemplars,  each  with 
168  elements,  was  created  from  a  bibliography  based  on  the  references  in  Hopfield's  paper1.  Two 
16-character  fields  were  used  for  the  name  (including  two  initials)  and  the  publication,  and  a  2- 
character  field  was  used  for  the  date.  The  16-character  fields  consisted  of  the  letters  “a”  through 
“z”  and  the  character  “_”  to  mark  the  end  of  the  character  string  if  shorter  than  16  characters. 
The  2-character  field  contained  the  last  two  digits  of  a  20th  century  date  (0  through  9).  Charac¬ 
ters  were  represented  by  5-bit  bytes  and  digits  by  4-bit  bytes  using  ASCII-like  representations. 
This  resulted  in  a  total  of  168  bits.  In  order  to  maintain  desirable  Hamming  distances,  it  was 
necessary  to  “pad  out”  a  field  with  “unique”  data  when  it  was  bigger  than  its  entry.  A  simple 
reflection  of  the  entry  was  performed  to  meet  this  requirement.  The  format  is  shown  in  Table  V, 
which  contains  the  1 1  exemplars  which  were  selected  from  a  list  of  27  references.  References  were 
selected  only  if  they  satisfied  a  range  of  Hamming  distances.  This  range  was  initially  set  to  a 


TABLE  V 

Exemplar  Bibliography  Patterns  Used  In  Experiment  No.  3 

jclonguethiggensprocroysoclondon68 

g _ willwacher _ rehbiolocybernetics76 

vbmountcastle _ elthemindfulbrain _ 78 

jaanderson _ nosrepsychologyreview77 

wbkristan _ natsirinformationproce80 

bwknight _ thgink _ Iectmathlifescie75 

idharmon _ nomrah _ neuraltheorymode64 

wsmcculloch _ hcolbulletinmathbiop43 

m _ minsky _ yksnim _ perceptrons _ snor69 

dohebb _ bbeh _ hebborganizationbeha49 

psgoldman _ namdlobrainresearch _ hc77 
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minimum  and  had  to  be  relaxed  until  1 1  references  were  found  to  meet  the  criterion.  The  result¬ 
ing  Hamming  distances  covered  the  range  from  71  through  93,  a  reasonable  variation  from  the 
expected  distance  of  84  which  random  patterns  would  have  produced.  In  all  but  one  case,  the 
entire  information  was  exactly  retrievable  from  the  input  of  either  the  name  with  initials,  the 
name  without  initials,  or  the  publication.  It  was  observed  that  a  straight  Hamming  net  would 
have  sufficed. 

Orthogonalization,  described  in  the  previous  section,  has  two  positive  effects  on  a  Hopfield 
net:  (1)  it  improves  the  performance  of  an  already  stable  network,  and  (2)  it  allows  the  number 
of  exemplars  to  be  increased  without  losing  stability  as  long  as  M  ^  N.  Experiment  No.  1  was 
repeated  after  orthogonalizing  the  net.  This  experiment  involved  the  handcrafted  digit  patterns 
with  a  less  than  ideal  set  of  Hamming  distances.  A  dramatic  improvement  in  the  performance  of 
the  Hopfield  net  may  be  seen  in  Figure  17a.  The  Hopfield-Hamming  net  also  improved  in  perfor¬ 
mance,  as  shown  in  Figure  17b. 

Following  the  initial  orthogonalization  experiments  we  increased  the  number  of  exemplars  to 
16,  dropping  the  square  pattern  and  including  all  the  hexadecimal  digits  (0  through  F).  These  16 
exemplars  remained  stable.  Performance  statistics  were  gathered  for  error  rates  of  .100  through 
.300;  and,  as  we  expected,  the  Hopfield  and  Hopfield-Hamming  nets  suffered  a  degradation  in 
performance  as  illustrated  in  Figure  18a  and  Figure  18b,  respectively. 

The  bibliography  retrieval  Hopfield  net  was  also  orthogonalized  and  expanded  to  include  all 
27  entries  from  Hopfield's  paper.  Hamming  distances  covered  the  range  from  51  through  100, 
excluding  three  smaller  distances  due  to  the  fact  that  three  references  had  the  same  publication 
entry.  All  entries  were  retrievable  from  only  the  last  name  and  initials.  All  entries  but  one  were 
retrievable  from  only  the  last  name.  Four  entries  were  not  retrievable  by  publication  alone,  three 
of  which  shared  the  same  publication  and  one  of  which  differed  from  another  publication  by 
only  three  characters. 
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a)  HOPFIELD  NET  b)  HOPFIELD-HAMMING  NET 
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Figure  17.  Hopfield  net  (a)  and  Hop  field- Hamming  net  (b)  performance  with  and  without  orthogonalization  using  digit 
patterns  from  Experiment  No.  1. 
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digits  and  using  8  digit  patterns  as  exemplars. 


6.  CONCLUSIONS 


Two  neural  network  approaches  have  been  compared  as  they  apply  to  several  pattern  classi¬ 
fication  problems.  First,  it  was  shown  that  for  patterns  consisting  of  statistically  independent 
binary  components,  an  optimum  classifier  can  be  configured  as  a  single-layer  perceptron  followed 
by  a  densely-connected  neuron-like  net  that  labels  the  most  likely  stored  patterns.  We  called  this 
classifier  a  Hamming  net  and  showed  several  methods  of  implementing  this  algorithm.  We  also 
compared  implementation  complexity  of  the  Hamming  net  and  a  Hopfield  net  and  concluded 
that  the  Hamming  net  was,  in  principle,  a  simpler  device  than  a  fully-connected  Hopfield  net. 
This  was  followed  by  a  brief  review  of  the  Hopfield  model  and  the  discussion  of  several  tech¬ 
niques  for  enhancing  performance  of  this  model.  Finally,  we  presented  experimental  comparisons 
for  three  types  of  input  data.  For  these  data,  it  was  seen  that  the  storage  prescription  of'  yielded 
poorer  performance  than  did  the  Hamming  net.  By  carefully  choosing  the  stored  states  and  by 
orthogonalization,  Hopfield  net  performance  became  comparable  to  but  never  better  than  Ham¬ 
ming  net  performance  for  the  examples  chosen.  Further  research  is  needed  to  compare  such  sys¬ 
tems  when  the  neural  components  are  analog  rather  than  digital.  Further  research  should  also 
examine  the  performance  of  neural  classification  nets  with  analog,  continuously  variable,  inputs. 
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